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ViSpec: a graphical tool for elicitation of MTL requirements 


Bardh Hoxha^, Nikolaos Mavridis^ and Georgios Fainekos^ 


Abstract —One of the main barriers preventing widespread 
use of formal methods is the elicitation of formal specifications. 
Formal specifications facilitate the testing and verification pro¬ 
cess for safety critical robotic systems. However, handling the 
intricacies of formal languages is difficult and requires a high 
level of expertise in formal logics that many system developers 
do not have. In this work, we present a graphical tool designed 
for the development and visualization of formal specifications by 
people that do not have training in formal logic. The tool enables 
users to develop specifications using a graphical formalism 
which is then automatically translated to Metric Temporal 
Logic (MTL). In order to evaluate the effectiveness of our tool, 
we have also designed and conducted a usability study with 
cohorts from the academic student community and industry. 
Our results indicate that both groups were able to define formal 
requirements with high levels of accuracy. Finally, we present 
applications of our tool for defining specifications for operation 
of robotic surgery and autonomous quadcopter safe operation. 

1. Introduction 

As robots become commercially available, their correct 
operation is of paramount importance. Especially for safety 
critical systems, safety must be guaranteed. As for example 
in autonomous vehicles [24] and medical robots [17], [13]. 

Safety requirements are usually expressed in natural lan¬ 
guage, which is inherently ambiguous, in general. When it is 
used for defining system specifications, this ambiguity may 
lead to misunderstandings between development teams that 
may result in increased costs and delays in development. If 
the misunderstandings are not detected, then a product that 
does not meet the intended specifications will be developed. 

Ideally, specifications should be defined in a mathematical 
language, using formal logics. This not only removes ambi¬ 
guity, but also allows system developers to utilize a vast set 
of methods [22] that have been developed by the academic 
community for testing and verification of systems. The aca¬ 
demic community has also developed automatic tools such as 
S-TaLiRo [2], [11], Papas [25], SpaceEx [9], CheckMate 
[19], Plow [4], Breach [6], C2E2 [7], KeYmaera [18] and 
Strong [5] that enable developers to conduct system testing 
and verification. 

Even though it has been shown, that utilizing formal 
specifications can lead to improved testing and verification 
[8], the industry still utilizes natural language as the premier 
approach in defining specifications. One may conjecture 
that the most important reason for doing so is because the 
development of specifications through a formal logic requires 
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a level of mathematical training that many users may not 
have [23]. Purthermore, even for expert users, writing formal 
specifications is an error prone task [10]. As a result, the 
industry has been less willing to utilize formal specifications 
in their processes. 

In this work, we present a graphical formalism that enables 
non-expert users to develop formal specifications for control 
systems. The formalism enables the visualization of a large 
fragment of MTL. The main challenge in the development 
of the formalism lies in finding the right balance between 
expressive power and ease-of-use. It is designed for use with 
systems and signals and enables both event and time based 
specifications. This is the first time that a visual formal 
language representation is developed for specifications for 
Cyber-Physical Systems (CPS). Here by CPS we define 
any system that has discontinuous nonlinear dynamics and 
complex safety critical requirements. Prime examples are 
medical robotics and autonomous vehicles. A specification 
visualization tool has been developed based on the graphical 
formalism presented in this work. To evaluate the usefulness 
of the tool in terms of usability and ease-of-use, we have 
conducted a usability study. 

Summary of Contributions: 

• We present a graphical formalism that enables the 
development of formal specifications. 

• We present the visual specification tool based on the 
graphical formalism. 

• We conducted a usability study to evaluate the tool. 

• Through the usability study we proved that both non¬ 
expert users and expert users are able to define formal 
requirements accurately using the tool, and derived 
suggestions for improvement of the tool. 

• We present applications of the tool for real-world robots. 

Related works: In order to help address the formal 
specification challenge, various graphical formalisms have 
been studied in the past [20], [1], [15], [3], [26], [21]. The 
most relevant works appear in [3] and [26]. In [3], the authors 
extend Message Sequence Charts and UML 2.0 Interaction 
Sequence Diagrams to propose a scenario based formalism 
called Property Sequence Chart (PSC). The formalism is 
mainly developed for specifications on concurrent systems. 
In [26], PSC is extended to Timed PSC which enables the 
addition of timing constructs to specifications. 

In terms of usability studies for formal requirements very 
few works exist. In [23], the authors study the ability 
of expert users to develop requirements in Z. A related 
usability study for requirement representation is presented 
in [16], where the authors present and evaluate a system 




Fig. 1: Overview of the graphical user interface of the MTL specification tool. The example shown represents the MTL 
specification 0 = ^[Q^AQ]{{speed < 80) -> n[o, 4 o](^P^ < 4000)). 


for generating, troubleshooting and executing controllers for 
robots using natural language. 

II. Visual Specification Tool 

The Visual Specification Tool (ViSpec|^ enables the 
development of formal specifications for CPS. Users can 
develop requirements in a graphical formalism which is then 
translated to Metric Temporal Logic (MTL) [14]. 

The topic of capturing requirements through graphical 
formalisms has been studied in the past [20], [1], [15], [3], 
[26]. However, to the best of the authors knowledge, the work 
presented here is the first attempt to do so specifically aimed 
for the development of specifications for CPS. The initial 
idea for the graphical formalism was first presented in [ 11 ] 
while the tool was still in the early stages of development. 
However, in this work we present an updated version of the 
tool along with its usability study. The improvements over 
the previous version include: a more streamlined interface; 
an updated representetion of signals in the interface; and an 
updated template definition process. 

For CPS specifications, it is often needed to account for 
both timing and event sequence occurrence. Both of these are 
necessary for reasoning over systems and signals. Consider 
the specification n[ 0 ^ 5 ]((speed > 100 ) ^ (rpm > 

4000)). It states that whenever within the first 5 seconds, 
the vehicle speed goes over 100 , then from that moment 
on, the engine speed (rpm), for the next 5 seconds, should 
always be over 4000. Here both the sequence and timing of 
the events are of critical importance. 

To ensure that the tool can be utilized by non-expert users, 
the following goals for the tool are defined: 1) The user 
interface is intuitive to use, i.e, it does no have a high learning 
curve; 2) The visual representation of the requirements is 
visualy distinct and unambiguous; 3) There is a one-to-one 
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mapping from the visual representation of the requirement 
and the corresponding requirement in MTL. 

The set of specifications that can be generated from this 
graphical formalism is a proper subset of the set of MTL 
specifications. Formally, the following grammar produces 
the set of formulas that can be expressed by the proposed 
graphical formalism: 

S ^ -T I T 

T —/ A I B I C 

A —/ P I (PAA) I (P^A) 

B —y LljD I OxD 
C —y n^OxU I OxLixD 

D p I (p^A) I (pAA) I (p^B) I (pAB) 

P —s- p I Dip I Oxp 

where p is an atomic proposition. In practice, the atomic 
propositions are automatically derived from the templates. 

Throughout the development process of the formalism, 
it was noticed that the more expressive the formalism, the 
more challenging to use it became. Therefore, we focused 
on several widely used classes of specifications which are 
described in Table |T| Examples of the classes of specifications 
are presented in the rest of this section. 

To make the tool easier to use, we placed several con¬ 
straints on the types of signals used. Specifically, the signals 
and requirements are one dimensional. This enables clear and 
structured visualization on a two dimensional user interface. 

In Fig. the user interface of the tool is presented 
along with its most critical components. The user interface 
is composed of a menu, horizontal timeline, rectangular 
blocks called templates, and a zoom scroll. While the passage 
of time is represented horizontally, the sequence of events 
is presented vertically. The formulas are generated from 
templates as well as the connections between them. 

The main building blocks of the formalism are templates. 
These are used for defining temporal logic operators, their 
timing intervals, and the expected signal shape. The user 


























TABLE I: Classes of specifications expressible with the graphical formalism 


Specification Class Explanation 


Safety 

Reachability 

Stabilization 

Recurrence 

Implication 
Reactive Response 

Conjunction 

Non-strict Sequencing 


Specifications of the form used to define specifications where 4> should always be true. 

Specifications of the form O0 used to define specifications where 4> should be true at least once in the future (on now). 
Specifications of the form <>□(/) used to define specifications that, at least once, (() should be true and from that point 
on, stay true. 

Specifications of the form nO(/) used to define specifications that, it is always the case, that at some point in the future, 
0 is true. 

Specifications of the form 0 —)■ '0 requires the should hold when cj) is true. 

Specifications of the form A/^(0 —)■ M'lp), where N and M are temporal operators, used to define an implicative 
response between two specifications where the timing of M is relative to timing of N. 

Specifications of the form cp A'lp used to define the conjunction of two sub-specifications 

Specifications of the form N{(j)AM'ip), where N and M are temporal operators, used to define a conjunction between 
two specifications where the timing of M is relative to timing of N. 


starts with an empty template and a setup assistant presents 
the user with a sequence of dialog boxes that aid in the de¬ 
velopment of the template. The process is context dependent 
where each option selection leads to a potentially different 
set of options for the next step. 

The first step in the template definition process is to 
define the temporal operator. Among the choices (and their 
corresponding MTL symbols) are: Always (□), At Least 
Once (O), Eventually Always (On), Repeatedly Often and 
Finally (DO), and now. The options available enable users to 
define a wide range of specifications. The following sections 
will present examples of a subset of formulas that can be 
generated using this graphical formalism. 

After the temporal operator is selected, the user sets 
the timing bounds for it. Many users might have difficulty 
defining timing bounds, especially for specifications with 
temporal operators such as Eventually Always (On) and 
Repeatedly Often and Finally (nO). To illustrate the process, 
the tool provides a fill-in-the-blanks sentence format to 
the user. For example, if the operator Eventually Always 
is selected, the user will have to complete the following 

sentence with the timing bounds: “Eventually, between_ 

and_seconds, the signal will become true, and from that 

point on, will stay true in the next_to _ seconds”. 

The set of timing intervals are visualized with color shaded 
regions in the template. 

The next step in the process is in defining whether the 
predicate will evaluate to true when the signal is above 
or below a set threshold. For example, for the Always (□) 
operator, a signal is selected that is either always above or 
below a specified threshold. Once either option is selected, 
various signals that fit the requirement are automatically 
generated and presented visually. Instead of drawing the 
signal, the user will select from one of the generated options. 
Consider the following example: 

Example 1 A specification from the fragment of MTL for¬ 
mulas called Safety MTL specifications is presented. Specif¬ 
ically, the specification = LI[Q 3 g](rpm < 4000). The 
formula states that in the next 36 seconds, engine speed 
should always be less than 4000. The corresponding graph¬ 
ical formalism for this formula is presented in Fig. Note 
that, in regards to the specification, the signal can be of any 


shape as long as it is always below the 4000 threshold. 

Consider the following example for the At Least Once (O) 
temporal operator: 

Example 2 A specification from the fragment of MTL for¬ 
mulas called Reachability MTL specifications is presented. 
Specifically, the specification 02 = <>[o, 39 ] > 100). 

The formula states that eventually, within the next 39 sec¬ 
onds, the vehicle speed will go over 100. The corresponding 
graphical formalism for this formula is presented in Fig. 
Again, in regards to the specification, the signal can be of 
any shape as long as at one point, within the timing bounds 
of the temporal operator, it is above the 100 threshold. 



Fig. 2: Example The graphical formalism for the Safety 
MTL specification 0i = LI[o^ 36 ](rpm < 4000). 


Fig. 3: Example The graphical formalism for the Reach¬ 
ability MTL specification 02 = ^[o, 2 > 9 ]{speed > 100). 

For the Eventually Always (On) operator, at least once 
in the timing interval of the eventually operator, the signal 
should go above the threshold and stay there for the entire 
timing interval of the always operator. Two types of shading 
will indicate the timing bounds of the MTL operators. 



























Fig. 4: Example The graphical formalism for the MTL 
specification ^3 = <>[ 0 , 30 ]^[ 0 , 10 ](speech > 100 ). 

Example 3 Consider the specification fs = 
0[o,30]l^[o,io](<^P^^<^ > 100). The formula states that 

at some point in the first 30 seconds, the vehicle speed 
will go over 100 and stay above for 10 seconds. The 
corresponding graphical formalism for this formula is 
presented in Fig. 

For the Repeatedly Often and Finally (DO) operator, an 
oscillating signal is presented where two types of shading 
indicate the timing intervals for each MTL operator. Consider 
the following example: 

Example 4 The specification = ^[o, 3 o]^[o,io]{^P^^d > 
100) is presented. The formula states that at every timestep 
of the simulation in the first 30 seconds, the speed will go 
over 100 within the next 10 seconds. The corresponding 
graphical formalism for this formula is presented in Fig. 
1^ No matter how far to the left or right the green shaded 
region is moved, contained within the orange region, there 
is always a point where the signal is above the threshold. 
Recall that the signal is automatically generated so that it 
satisfies the options previously selected. 



Fig. 5: Example The graphical formalism for the MTL 
specification ^4 = LI [ 0 ^ 30 ] O[ 0 , 10 ] (speed > 100 ). 

The next important concept in this graphical formalism is 
the relationship between templates. 

First, the sequence relationship between two templates is 
presented. Assume that the first template is already created. 
If another template is added below it, then an order in the 
execution of the events is defined. The second template is 
only considered if the first template is evaluated to true. 
Formally, there is an implication relationship from the first 
template to the second. Consider the following example: 

Example 5 The specification ^5 = (O[ 0 , 40 ] (speed > 

100)) ^ (<>[ 0 , 30 ] (^P^ > 3000)) is presented. The formula 


states that if, within 40 seconds, the vehicle speed is above 
100 then within 30 seconds from time 0, the engine speed 
should be over 3000. The corresponding graphical formalism 
for this formula is presented in Fig. 



Fig. 6 : Example The graphical formalism for the 
MTL specification ^5 = (O[ 0 , 40 ] (speed > 100)) ^ 
(O[0,30](^P^ > 3000)). 

A second type of relationship enables the user to establish 
conjunction between two events. To achieve this, templates 
can be grouped. This is indicated by a bold black box. Doing 
so requires that both templates evaluate to true. Consider the 
following example: 

Example 6 Specification fie = (LI[ 0 ^ 40 ](speed < 100)) A 
(□[ 0 ^ 40 ] (rpm < 4000)). The formula states that, within 40 
seconds, the vehicle speed should be less than 100 and 
the engine speed should be under 4000. The corresponding 
graphical formalism for this formula is presented in Fig. 

The third type of template relationship enables the user 
to establish relative timing between two templates. Consider 
the following example: 

Example 7 Specification = LI[ 0 ^ 40 ] ((speed < 80) ^ 
n [0 40 ](rpm < 4000)). Here, the nested specification 

n [0 40 ](rpm < 4000) is evaluated every time {speed < 80) 
is true. This formula is represented in the formalism with 
nested templates, otherwise referred to as parent and child 
templates. The second template is tabbed and connected to 
the first template using a green indicator. In the GUI, such 
a nested template is initiated by clicking on the signal of the 
parent template. The corresponding graphical formalism is 
presented in Fig. 

The variety of templates and the connections between 
them allow users to express a wide variety of specifications. 

III. Graphical Formalism 

The specification development process in ViSPEC is di¬ 
vided in two sub processes. First, given a user input in the 
ViSpec tool, it is translated to a tree structure where the 
nodes contain template information such as temporal oper¬ 
ators, their corresponding timing parameters, group and the 




































value threshold for the predicates. Secondly, the generated 
tree structure is traversed by a recursive algorithm to generate 
the MTL formula. There is a bijection between the visual 
representation of a specification and the MTL formula. An 
overview of the process is provided in Fig. 

An example of the tree structure for MTL formula (p = 
n(a A Ob) (nc A 0{d {a A □&))) is shown in Fig. 
[m The recursive algorithm for traversing the tree structure 
and generating the MTL formula is presented in Alg. 
Note that the functions addParenConn{A,B,C,D} add 
the parenthesis and connectives between predicates. 

IV. Usability Study 


A. Hypotheses 

The aim of the study is to evaluate whether ViSpec 
enables users to develop formal specifications. Two groups 
were considered: 

1) Non-expert users: These are users who declared that 
they have no experience in working with requirements. 

2) Expert users: These are users who declared that they 
have experience working with system requirements. 
Note that they do not necessarily have experience in 
writing requirements using formal logics. 

Some of the interesting questions we wanted to investigate, 
which are also presented as hypotheses in Tab. are: 

• Whether the graphical formalism enables non-experts 
and experts to formalize requirements accurately. 

• How well the expert cohort performs in comparison to 
the non-expert cohort. 

• How user friendly and easy-to-use ViSpec is. 

Writing formal requirements is a challenging task that 
requires a significant amount of training. Therefore, it is safe 
to assume that we can reject Hypothesis la as supported by 
our informal experience. Hypothesis 2a will be tested in a 
future work. In addition, we analyze user interaction and 
behavior to measure the ease-of-use of the tool. 



Fig. 7: Example!^ The graphical formalism for the MTL 
specification pQ = (LI[o, 4 o] < 100)) A (n[o, 4 o] (rpm < 

4000)). 


Algorithm 1 WriteMTL - Algorithm for generating the MTL 
formula given a tree structure of the graphical formalism 


Input: Tree Structure T = {V^E) where v and v = 
(G, Op, S) where G is the group. Op is the temporal 
operator and S is the predicate string; string p 

Output: (j) 

I: function writeMTL(T, 0) 

2 : C ^ T.getChildren. 

3: sC ^ size(G) 

4: for node i in G do 

5: (p A- CONC(0, i.Op) 

6 : if AisParent then 

7: if not(i.S.isEmpty) then 

8 : subC ^ t.getChildren{i) 

9: if i.G == subG{l).G then 

10: p A- CONC(0, ’a’) 

II: else 

12 : if ids Parent then 

13: p i — CONC(0, W i.SJ—y (’) 

14: else 

15: p ^ CONC(0, 

16: end if 

17: end if 

18: p ^ WRlTEMTL(i.subtree,p) 

19: if i.isParent then 

20 : if AG == subG.G then 

21 : p ^ CONC(pjy) 

22: else 

23: p ^ CONC(0,’))’) 

24: end if 

25: else 

26: if sG > 1 and i ^ sG then 

27: p A- CONC(pJ) ^’) 

28: else 

29: p ^ CONC(0,’)’) 

30: end if 

31: end if 

32: else 

33: p A- CONC(0,’(’) 

34: p ^ WRlTEMTL(i.5i46tree,0) 

35: if i 7 ^ sG then 

36: p A- CONCipJ) ^’) 

37: else 

38: p A- CONC(0,’)’) 

39: end if 

40: end if 

41: else 

42: p ^ CONCipp.S) 

43: if i ^ sG then 

44: p ^ CONC((/),’A’) 

45: else 

46: p ^ CONC((/),’^’) 

47: end if 

48: end if 

49: end for 

50: end function 



























TABLE II: Hypotheses and test results with level of significance a = 0.05. User groups as defined in section IV.A. 


Hypothesis Reject null hypothesis 


la Non-expert users are able to dehne formal requirements accurately using formal logics such as MTL. 

lb Non-expert users are able to dehne formal requirements accurately using the Visual Specihcation Tool. Yes 

2a Expert users from the industry are able to dehne formal requirements accurately using formal logics such as MTL. 

2b Expert users from the industry are able to dehne formal requirements accurately using the Visual Specihcation Tool. Yes 

Salt The mean grade per user for expert users is greater the mean grade per user for non-expert users. Yes 

Txait The mean grade per task x for industry users is greater than to the mean grade per task x for non-expert users. Partially 



Fig. 8 : Example The graphical formalism for the MTL 
specification 07 = ^[ 0 ^ 40 ] {{speed < 80) ^ ^[o,Ao]{rp'm < 
4000)). 

B. Demographics 

The non-expert cohort was comprised of twenty subjects 
from the student community of Arizona State University. 
Most of the subjects are from an engineering background 
with little to no experience working with requirements. The 
student demographics are presented in Tab. m 

The expert subject cohort was comprised of ten subjects 
from the industry in the Phoenix area. The subjects have 
experience working with specifications and come from an 
engineering background. 


TABLE III: 

Hypothesis 16 Subject 

Demographics 

Ereshman 

2 

Computer Science 

5 Male L 

Sophomore 

2 

Software Engineering 

3 Female 1 

Junior 

5 

Electrical Engineering 

3 

Senior 

5 

Mechanical Engineering 

6 

Masters 

4 

Engineering, other 

3 

PhD 

2 




C. Experimental Design 

Each subject received a task list to complete. The task 
list contained ten tasks related to automotive system speci¬ 
fications. Each task asked the subject to formalize a natural 
language specification through ViSPEC and generate an 


MTL formula. The list of tasks is presented in Table VI 


The tasks become more complex throughout the session. 
The higher the number of the task, the more steps necessary 
to complete the task successfully. 

Each session is at most 45 minutes long. Subjects received 
a one minute and thirty second tutorial on using ViSPEC to 
develop specifications. The computer screen was recorded 


and actions were logged for each session. The subjects also 
completed a demographic and post-completion questionnaire. 

D. Metrics 

Two metrics are used for performance evaluation: 

Task completion: this is a binary measure, which indicates 
whether users were able to finish the task within the set time. 

Measure of Accuracy: a value from one to five which is 
used to quantify the accuracy of subject generated formulas. 
The formulas are graded by formal specification experts 
which were given the following two suggested criteria: a) 
How accurate the meaning of the natural language specifica¬ 
tion is captured, and b) Whether the inaccuracies in the user 
submitted formula can be easily debugged and corrected in 
the testing and verification process. Furthermore, in order to 
decrease subjectivity, the following instructions were given 
to the expert graders in order to anchor the meanings of 
the five different grades of the scale used: A grade of one 
indicates that the generated formula is totally inaccurate. A 
grade of two indicates that the formula is mostly inaccurate. 
A grade of three indicates an inaccurate formula which can 
be easily debugged and corrected to the proper formal logic 
specification by formal specification experts and thus this is 
the minimum acceptable satisfactory result. A grade of four 
indicates that the formula is inaccurate but can be debugged 
and improved by automated specification debugging tools. 
A grade of five indicates that the generated formula is 
completely accurate. The group of expert graders consisted 
of experts in formal methods and logic. 

V. Results 

1) Average grade per task: For both cohorts, the task 
performance is presented in Fig. It can be observed that 
overall, the mean grade per task for both cohorts is high. 
Consider the mean grade per task as a random variable X. 
Specifically, X : U —> M, where U G : 1 < ^ < 5}. 
In Figure we present the survival function ^x{^) — 
1 — Fx{x) = 1 — P{X <x) = P{X > x) based on sample 
data. Note that x is the threshold of mean grade accuracy. 

2) Hypothesis lb: To test Hypothesis 16, we need to 
establish what is an acceptable threshold for accuracy in 
order to test the hypothesis. As discussed in the metrics 
section, we claim that a mean grade higher than three 
is an acceptable threshold for non-expert users. Therefore, 
hypothesis 16 is reduced to the null hypothesis: the mean 
grade per user is less than or equal to three for non-experts. 

Let us define the average grade per user as a random 
variable Y. Specifically, Y : U M, where ft e {y : 

























Bar plot of mean grade and std. dev. over tasks for non-expert users Bar plot of mean grade and std. dev. over tasks for expert users 
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Fig. 9: Subject accuracy grades over tasks for both the expert and non-expert cohorts. 
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Fig. 10: The specification development process using 
ViSpec 




Fig. 11: The corresponding tree structure for formula <p = 
n(a A Oh) (nc A 0{d {a A □&))) where a,h,c and 
d are predicates. Each node is composed of a node name, 
group number, temporal operator, and predicate. The symbol 
e indicates empty parameters. 

1 < ^ < b}. The sample data from 20 subjects has a mean 
grade of 4.43 and standard deviation of 0.41. We test for 
normality with the Kolmogorov-Smirnov test, the Chi-square 
g.o.f test, and the Anderson-Darling test and all three fail to 
reject the null hypothesis that the data follows the normal 
distribution. In figure we plot the non-expert data against 
a fitted normal distribution and the corresponding Q-Q plot. 
If we assume that the data constitute a random sample from 
a normal distribution, i.e. Y ^ A/’, we can use the t-statistic 
to test the hypothesis. We reject the null hypothesis with a 
p-value very close to 0. 

3) Hypothesis 2b: Similarly, we test Hypothesis 26 for 
the expert cohort. Hypothesis 26 is reduced to the null 
hypothesis: the mean grade per user is less than or equal to 
three for expert users. We test for normality as in the previous 
case and all three test fail to reject the null hypothesis that 
the data follows the normal distribution. 

Consider the average grade per user as a random variable 
Z. Specifically, Z : (2 ^ M, where G : 1 < ^ < 5}. 
The sample data from 10 subjects has a mean grade of 4.76 
and standard deviation of 0.26. In figure we plot the 
non-expert data against a fitted normal distribution and the 
corresponding Q-Q plot. If we assume that the data constitute 
a random sample from a normal distribution, i.e. Z Af we 



Fig. 12: Top: The empirical probability that the mean grade 
per user is greater than threshold x for the non-expert and 
expert subjects, i.e., P{Y > x). 

Bottom: The empirical probability that the mean grade per 
task is greater than threshold x for the non-expert and expert 
subjects, i.e., P{X > x). 

can use the t-statistic to test the hypothesis.We reject the null 
hypothesis with a p-value very close to 0. 


Task completion time for non-expert and expert cohorts 
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Fig. 13: Example!^ The graphical formalism for the Reach¬ 
ability MTL specification 02 = ^[o, 39 ] > 100). 


4) Hypothesis 3ait- To test Hypothesis 3ait, we conduct 
a two sample t-test. The p-value returned from the test is 
0.0024 and for a significance level of 0.01, we reject the 
null hypothesis. Therefore we claim that the mean grade per 
user for expert users is greater than the mean grade per user 
for non-experts. 

5) Hypothesis Tx: Next, we compare the mean grade of 
both cohorts in regards to each task. A two sample t-test 
is conducted for each task. The results for the tests are 
























































TABLE IV: ViSpec improvements 


# Improve... Prime Indicators 

1. the process of creating child templates misclicks; user feedback 

2. the tutorial by placing more emphasis task accuracy grade 

on the difference between implication 

and conjunction between templates 

3. the visual representation of grouped task accuracy grade; 

templates user feedback 


presented in Tab. [V] Task 9 is the most difficult task when 
it comes to the number of errors generated, and this is the 
only task where there is a clear difference in performance 
between the expert and non-expert cohorts. 

TABLE V: Hypothesis testing of Txnuii with a = 0.05 


X 

T^CyiuU 

p-val. 

Conclusion 

4 

No 

0.065 

potentially true with more investigation 

5 

No 

0.165 

false 

6 

No 

0.074 

potentially true with more investigation 

7 

No 

0.100 

potentially true with more investigation 

8 

No 

0.424 

false 

9 

Yes 

0.016 

true 

10 

No 

0.063 

potentially true with more investigation 


We observe that the only null hypothesis rejected is for 
task nine indicating that the mean grade for expert users is 
greater than the mean grade for non-expert users. The subject 
accuracy grades over tasks for is shown in Eig. 

6 ) Ease-of-use analysis: One indicator for the ease-of-use 
of the application is the total time spent per task. As can be 
observed in Eig. the mean time spent per task on average 
is at most 167 seconds. Eor easier identification of points of 
difficulty, we divided each task into subtasks. It was observed 
that there is no correlation between the length of time spent 
in a subtask and correctness. This potentially indicates, as 
also verified by correlation testing between times and grades, 
that the subjects were unaware of mistakes in the process. 
Erom these and other observations, such as misclicks, and 
subject feedback, we have developed a set of refinements 
on the tool to improve the user experience. A partial list of 


improvements is presented in Table IV 


VI. Applications 


A. Robotic Surgery 

In the last few decades, there has been a significant 
increase in the number of robotics systems, especially in the 
health care system. They have been successfully introduced 
in multiple areas such as rehabilitation, telesurgery, physical 
therapy, elderly care, and remote physician care. In the 
following, we will focus on autonomous robotic systems for 
surgery where of paramount importance is the safety of these 
systems [13]. Specifically, we will consider a model of a 
robotic serial link manipulator as presented in [17]. 

One of the main tasks in surgery is the puncturing action. 
The high precision and repeatability of the process, make 
robot systems ideal for this task. Also, the trauma induced 
around the region is much lower and therefore the recovery 
process for the patient is quicker. To complete the puncturing 
action, the robot has to move towards the puncturing location. 


Test the tissue for various indicators to calibrate for optimal 
puncture, bring the puncturing needle to a perpendicular 
position and, finally, puncture with correct force and angle. If 
the force or angle is miscalculated, it might pose unintended 
harm to the patient. Consider the specifications from [17] 
that should hold on a serial manipulator for puncturing: 


1) Erom [17]: The force applied to the patient by the end 
effector is always less than a given threshold, except 
for the puncturing subtask. Eormally, assuming that 
the operation time is 30 seconds, we have: (psi = 
^[o, 30 ]hpuncturing f < fmax)- 

2) Erom [17]: The task is feasible, and the position of 
the needle once it stops is inside the target region 
R. Eormally, assuming that the operation time is 40 
seconds, we have: (j)s 2 = O[o, 4 o] (-Stop A need/e G R)). 

3) Also, other requirements can be expressed for such a 
system. Eor example, the end effector speed should not 
be less than Vmin and should not be greater than v^ax • 
Eormally: 0^3 = n[o, 4 o]( "^min < ^'e// < 


The ViSpec tool is utilized to develop the specifications 
for the robotic manipulator. Eor 0si, the specification is 
presented in Eig. We assume that fmax = 10- For 
the specification is presented in Eig. We assume that 
needle G R <^==^ 5 < < 10 A 5 < < 10, where 

Ux^Uy are the x and y coordinates for the needle. Eor 0^3, 
the specification is presented in Eig. We assume that 
— 10 and Vmax — 20. 


B. Quadcopter 


In recent years, quadcopters and other unmanned aerial 
vehicles (UAVs) have become a major focus for research both 
in the academic community and industry. Among others, they 
are used in military operations, nuclear disaster assessment, 
firefighting and entertainment. The challenges faced in devel¬ 
oping these devices and their control algorithms come from 
the flight dynamics and the highly dynamical environment 
that they operate in. Also, as the complexity of these devices 
increases, so do the performance and reliability requirements. 

Consider the following specifications for a quadrotor: 


1) The absolute value of the pitch and roll angle should 
always be bellow certain thresholds. Eormally, assuming 
that the operation time is 40 seconds, we have: (j)qi = 

^[0,40] (1^1 ^ ^max) A n^Q^4Qj(|/3| < P max)- 

2) If distance to the target region is smaller than a cer¬ 
tain threshold d, then for then next 20 seconds, the 
speed should not exceed Vmax- Eormally, assuming that 
the operation time is 40 seconds, we have: (j)qi = 

°[0,40](c^*si <d^ n[0_20](l' < Vmax))- 


The ViSpec tool is utilized to develop the specifications 
for the quadrotor. Eor (j)qi, the specification is presented in 
Fig. [T^ We assume that Umax = 45 deg, (dmax = 45 deg 
and ymax = 60 deg. Eor 0^2, the specification is presented 
in Eig. We assume that <7 = 5 and Vmax = 10- For 0^3, 
the specification is presented in Eig. We assume that 
"^min — 10 and Vmax — 20. 











TABLE VI: Task list with automotive system specifications presented in natural language 


Task 


Natural Language Specification 


1. Safety 

2. Reachability 

3. Stabilization 

4. Recurrence 

5. Recurrence 

6. Implication 

7. Reactive Response 

8. Conjunction 

9. Non-strict sequencing 

10. Long sequence 


In the first 40 seconds, vehicle speed should always be less than 160. 

In the first 30 seconds, vehicle speed should go over 120. 

At some point in time in the first 30 seconds, vehicle speed will go over 100 and stay above for 20 seconds. 

At every point in time in the first 40 seconds, vehicle speed will go over 100 in the next 10 seconds. 

It is not the case that, for up to 40 seconds, the vehicle speed will go over 100 in every 10 second period. 

If, within 40 seconds, vehicle speed is above 100 then within 30 seconds from time 0, engine speed should be over 3000. 
If, at some point in time in the first 40 seconds, vehicle speed goes over 80 then from that point on, for the next 30 
seconds, engine speed should be over 4000. 

In the first 40 seconds, vehicle speed should be less than 100 and engine speed should be under 4000. 

At some point in time in the first 40 seconds, vehicle speed should go over 80 and then from that point on, for the next 
30 seconds, engine speed should be over 4000. 

If, at some point in time in the first 40 seconds, vehicle speed goes over 80 then from that point on, if within the next 20 
seconds the engine speed goes over 4000, then, for the next 30 seconds, the vehicle speed should be over 100. 


Non-expert data fit with Normal distribution Expert data fit with Normal distribution 



QQ Plot of Sample Data versus Standard Normal 
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Fig. 14: Subject data fit with a normal distribution and the corresponding Q-Q plot. 



Fig. 15: The graphical formalism for (j)si. 

VII. Conclusion and Future Work 

As robots and other cyber-physical systems become more 
complex and ubiquitous, so does the need for better testing 
and verification. A set of formal methods that improve 
this process require some formal representation of system 
specifications. In this work, a graphical formalism and a tool 
that enables users to easily develop formal specifications are 
presented. The ViSpec tool enables users who have little to 
no mathematical training in formal logics to develop formal 
specifications, as was verified by a usability study that was 
conducted in order to evaluate the usefulness of the tool 
and to get insights on potential improvements. The tool was 
utilized to formalize specifications for two robots. 

Last but not least, we would like to investigate if the poten¬ 
tial inaccuracies of the specifications that users generate with 



Fig. 16: The graphical formalism for <pqi. 

the tool can be attributed mainly to the inherent ambiguity 
of the natural language descriptions which were given, or if 
not, which other factors contribute and to what extent. Thus, 
in an improved usability study, we aim towards exploring 
alternative methods of generation of requirements from engi¬ 
neers for a system, that do not involve the administration of a 
natural language description by the experimenter. This would 
enable us to study to what extent inherent natural language 
ambiguity causes the observed less-than-perfect accuracy that 
is sometimes, even if rarely, exhibited. 
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Fig. 18: The graphical formalism for 
reviewers for the detailed reviews. 
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